在本文中,我们解决了多视图3D形状重建的问题。尽管最近与隐式形状表示相关的最新可区分渲染方法提供了突破性的表现,但它们仍然在计算上很重,并且在估计的几何形状上通常缺乏精确性。为了克服这些局限性,我们研究了一种基于体积的新型表示形式建立的新计算方法,就像在最近的可区分渲染方法中一样,但是用深度图进行了参数化,以更好地实现形状表面。与此表示相关的形状能量可以评估给定颜色图像的3D几何形状,并且不需要外观预测,但在优化时仍然受益于体积整合。在实践中,我们提出了一个隐式形状表示,SRDF基于签名距离,我们通过沿摄像头射线进行参数化。相关的形状能量考虑了深度预测一致性和光度一致性之间的一致性,这是在体积表示内的3D位置。可以考虑各种照片一致先验的基础基线,或者像学习功能一样详细的标准。该方法保留具有深度图的像素准确性,并且可行。我们对标准数据集进行的实验表明,它提供了有关具有隐式形状表示的最新方法以及传统的多视角立体方法的最新结果。
translated by 谷歌翻译
来自RGB视频的多人姿势理解包括三个复杂的任务:姿势估计,跟踪和运动预测。在这三个任务中,姿势估计和跟踪是相关的,跟踪对于运动预测至关重要。大多数现有作品要么专注于单个任务,要么采用级联方法来分别解决每个任务。在本文中,我们提出了狙击手,这是一个框架,以同时进行单个推断,同时进行多人3D姿势估计,跟踪和运动预测。具体而言,我们首先提出了一种可变形的注意机制,以从视频片段中汇总时空信息。基于这种可变形的注意力,学会了视觉变压器来编码从多框架图像中的时空特征,并解码信息性姿势功能以更新多人姿势查询。最后,对这些查询进行了回归,以预测一个正向传球中的多人姿势轨迹和未来动作。在实验中,我们显示了狙击手对三个具有挑战性的公共数据集的有效性,在该数据集中,通用模型竞争对手专门的姿势估计,跟踪和预测的最先进基线。代码可在\ href {https://github.com/jimmyzou/snipper} {https://github.com/jimmyzou/snipper}中获得。
translated by 谷歌翻译
最近,数据驱动的单视图重建方法在建模3D穿着人类中表现出很大的进展。然而,这种方法严重影响了单视图输入所固有的深度模糊和闭塞。在本文中,我们通过考虑一小部分输入视图并调查从这些视图中适当利用信息的最佳策略来解决这个问题。我们提出了一种数据驱动的端到端方法,其从稀疏相机视图重建穿着人的人类的隐式3D表示。具体而言,我们介绍了三个关键组件:首先是使用透视相机模型的空间一致的重建,允许使用人员在输入视图中的任意放置;第二个基于关注的融合层,用于从多个观点来看聚合视觉信息;第三种机制在多视图上下文下编码本地3D模式。在实验中,我们展示了所提出的方法优于定量和定性地在标准数据上表达现有技术。为了展示空间一致的重建,我们将我们的方法应用于动态场景。此外,我们在使用多摄像头平台获取的真实数据上应用我们的方法,并证明我们的方法可以获得与多视图立体声相当的结果,从而迅速更少的视图。
translated by 谷歌翻译
In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image. Existing approaches to digitize 3D humans struggle to handle pose variations and recover details. Also, they do not produce models that are animation ready. In contrast, ARCH is a learned pose-aware model that produces detailed 3D rigged full-body human avatars from a single unconstrained RGB image. A Semantic Space and a Semantic Deformation Field are created using a parametric 3D body estimator. They allow the transformation of 2D/3D clothed humans into a canonical space, reducing ambiguities in geometry caused by pose variations and occlusions in training data. Detailed surface geometry and appearance are learned using an implicit function representation with spatial local features. Furthermore, we propose additional per-pixel supervision on the 3D reconstruction using opacity-aware differentiable rendering. Our experiments indicate that ARCH increases the fidelity of the reconstructed humans. We obtain more than 50% lower reconstruction errors for standard metrics compared to state-of-the-art methods on public datasets. We also show numerous qualitative examples of animated, high-quality reconstructed avatars unseen in the literature so far.
translated by 谷歌翻译
多代理市场仿真通常用于为下游机器学习或加强学习任务创建环境,例如在部署它们以实时交易之前培训或测试交易策略。在电子交易市场中,只有多个市场参与者的互动导致的价格或体积时间序列通常是直接可观察到的。因此,需要校准多代理市场环境,以使模拟代理的相互作用与历史相互作用导致的时间序列 - 这使得解决高度复杂的大规模优化问题。在本文中,我们提出了一种简单而有效的框架,可以从历史时间序列观测校准多代理市场模拟器参数。首先,我们考虑一个新颖的资格概念,以绕过潜在的不可识别性问题。其次,我们通过Bonferroni校正概括了两个样本的Kolmogorov-Smirnov(K-S)测试,以测试两个高维时间序列分布之间的相似性,这在时间序列样本集之间提供了一个简单但有效的距离度量。第三,我们建议使用贝叶斯优化(BO)和信任区域BO(Turbo)来最小化上述距离度量。最后,我们展示了使用数值实验的框架的效率。
translated by 谷歌翻译
稀有事件仿真技术,如重要采样(是),构成强大的工具,以加速罕见灾难性事件的具有挑战性的估算。这些技术经常利用底层系统结构的知识和分析,以赋予赋予理想的效率保证。然而,黑匣子问题,特别是来自最近AI驱动的物理系统的安全关键型应用的问题,可以从根本上破坏他们的效率担保,并在没有诊断地检测的情况下导致危险的估计。我们提出了一个框架,称为深度概率加速评估(Deep-Prae)来设计统计保障是通过转换多功能的黑匣子采样器,但可能缺乏保证,以便我们称之为放松的效率证明,允许准确估计界限。论罕见事件概率。我们介绍了深度PRAE理论,将主导点概念与稀有事件集合通过深度神经网络分类器进行了学习,并证明了其在数值例子中的有效性,包括智能驾驶算法的安全测试。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译